Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat: brought back syntax highlighting for most common languages #5874

Closed
wants to merge 9 commits into from

Conversation

ichim-david
Copy link
Contributor

@ichim-david ichim-david commented Oct 11, 2024

Test plan

Ask chat to provide an example program for dart language.
You should now see the dart snippets having syntax highlighting which started missing after
the work done in this pull request:
#5232

Example screenshot with the fix in action:
dart-syntax

Changelog

Chat: brought back syntax highlighting for most common languages

… as dart or scss

- Updated rehype-highlight to ^7.0.0 in orde to get rid of lowlight
   2.9.0 and load only version 3.1.0
- Added dependency on highlight.js since we load these extra
  syntax languages and tslint complains if we don't add it
@ichim-david
Copy link
Contributor Author

ichim-david commented Oct 11, 2024

@sqs in #5232 you went from all to the common syntax bundle.
This meant that ppl now complain about missing dart syntax on GitHub and community forum
https://github.com/sourcegraph/cody/issues?q=is%3Aissue+is%3Aopen+syntax

I decided to give it a shot and I've added all of the languages that were part of the old "LANGUAGES" constant
that were missing from the common bunch.
These were:
clojure,
dart,
dockerfile,
elixir,
fortran,
groovy,
haskell,
html,
http,
jsonc,
matlab,
nix,
ocaml,
scala,
verilog,
vhdl,

Lowlight COMMON brings:
'arduino',
'bash',
'c',
'cpp',
'csharp',
'css',
'diff',
'go',
'graphql',
'ini',
'java',
'javascript',
'json',
'kotlin',
'less',
'lua',
'makefile',
'markdown',
'objectivec',
'perl',
'php',
'php-template',
'plaintext',
'python',
'python-repl',
'r',
'ruby',
'rust',
'scss',
'shell',
'sql',
'swift',
'typescript',
'vbnet',
'wasm',
'xml',
'yaml'

in total there are 53 languages supported.

There was also a request for Delphi that I could add an import for as there is someone in the forum asking for it. Depending on how many people would need this, support for Delphi syntax could also be added.
https://community.sourcegraph.com/t/syntax-highlighting/236/2

I believe that this repo could also forgo the loading of the common selection from Lowlight and opt to import exactly what it thinks it should import based on whatever analytics you guys have on what languages are requested with Cody Chat.
Just like they do in common we can also have in utils/highlight.ts.
https://github.com/wooorm/lowlight/blob/main/lib/common.js

Once you decide if the current crop I have made is what you want to support this should also be mentioned in the documentation so that the end users know which languages Cody Chat supports for syntax highlighting.

@ichim-david
Copy link
Contributor Author

ichim-david commented Oct 11, 2024

In the future either an alternative library should be used or extra works need to be done to see if you can read the color scheme of the editor and see if there isn't something that matches.
highlight-themes

There is already an issue that mentions the difference in coloring from the chat to the actual editor color
#5187

vscode/package.json Outdated Show resolved Hide resolved
- As requested in review as it seems there were
  performance issues with rehype-highlight 7.0.0 as
  seen in pull request sourcegraph#5313
@ichim-david
Copy link
Contributor Author

@vovakulikov @dominiccooney I am aware now that the build fails after pinning lowlight to version 2.9.0 since they made the common export the way I've imported from 3.0.
I will fix this issue, initially my pull request was fine however due to me not knowing of issues with "rehype-highlight": "^7.0.0" performance problem upon receiving feedback I've went back to 6.0.0 and wanted to get rid of the extra dependency of lowlight 3.1.
This is not easily seen for a contributor outside of the sourcegraph organisation such as myself where I won't even get any tests running until it's triggered by one of you guys.
Perhaps in the future this can be improved, for now that I know it's my fault with the latest code change I will fix it.

… languages from highligh.js

- This way we can trim the bundle even more by importing only languages that
   should be supported
@@ -56,7 +56,14 @@
"version-bump:patch": "RELEASE_TYPE=patch ts-node-transpile-only ./scripts/version-bump.ts",
"version-bump:dry-run": "RELEASE_TYPE=prerelease ts-node-transpile-only ./scripts/version-bump.ts"
},
"categories": ["AI", "Chat", "Programming Languages", "Machine Learning", "Snippets", "Education"],
"categories": [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't choose todo this format. Every time I save package.json within vscode I get this autoformat and this time it slipped my guard when I removed lowlight pin.

I don't know how you guys do not get package.json to re-format, looking at the biome config package.json is not an ignored file
https://github.com/sourcegraph/cody/blob/main/biome.jsonc#L82

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this arises because your IDE is formatting it with some different rule... if in VSCode, open that file, cmd-shift-P and Format Document with... and confirm that Biome is the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dominiccooney indeed I didn't have biome set as default which is a bit strange considering that the biome extension is installed and the biome config is within the Cody repo. At the very least VS Code should have given me the warning that biome isn't set as default for json files popup and ask me if I want to set it as default.
Nevertheless, I've modified the default so this shouldn't be a problem in the future.

@ichim-david
Copy link
Contributor Author

@vovakulikov @dominiccooney I am aware now that the build fails after pinning lowlight to version 2.9.0 since they made the common export the way I've imported from 3.0. I will fix this issue, initially my pull request was fine however due to me not knowing of issues with "rehype-highlight": "^7.0.0" performance problem upon receiving feedback I've went back to 6.0.0 and wanted to get rid of the extra dependency of lowlight 3.1. This is not easily seen for a contributor outside of the sourcegraph organisation such as myself where I won't even get any tests running until it's triggered by one of you guys. Perhaps in the future this can be improved, for now that I know it's my fault with the latest code change I will fix it.

I have removed lowlight dependency opting to import directly the languages from highlight.js which is what lowlight was doing anyway.
This way we don't have to deal with lowlight 2.9.0 which is a dependency of rehype-highlight 6.0.

the list of languages are:
arduino,
bash,
c,
clojure,
cpp,
csharp,
css,
dart,
delphi, (I added this myself in previous commit, the rest of languages are common plus your previous list)
diff,
dockerfile,
elixir,
fortran,
go,
graphql,
groovy,
haskell,
html,
http,
ini,
java,
javascript,
json,
jsonc,
kotlin,
less,
lua,
makefile,
markdown,
matlab,
nix,
objectivec,
ocaml,
perl,
php,
phpTemplate,
plaintext,
python,
pythonRepl,
r,
ruby,
rust,
scala,
scss,
shell,
sql,
swift,
typescript,
vbnet,
verilog,
vhdl,
wasm,
xml,
yaml,

At this point it is up to you guys on which languages you want to support and which not, if you are concerned with bundle size and you have some analytics on what people use as the most common languages then by all means let me know and I can trim this list even further.

Copy link
Contributor

@dominiccooney dominiccooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

How does this affect the plugin size?

It would be great to have a test that quickly confirmed some colors appear for a few major and minor kinds of files in this list.

@ichim-david
Copy link
Contributor Author

This looks good to me.

How does this affect the plugin size?

It would be great to have a test that quickly confirmed some colors appear for a few major and minor kinds of files in this list.

@dominiccooney I'll run the vite-analyzer and check. I am still waiting to hear from you guys on which syntaxes you think it actually make sense to support. Besides the 17 new ones loaded I question the usefulness of a bunch of languages that were previously loaded from the common bundle. If you can give me a list of things you know you want to have syntax support for I can modify the pull request to only load those.
Thank you for your feedback!

@dominiccooney
Copy link
Contributor

Patched this locally and tried it, it's great. I have pushed it here to run CI: #5953

@dominiccooney
Copy link
Contributor

@ichim-david regarding syntaxes to support, I don't have data about generated code languages handy, but here's the top 50 languages for autocomplete suggestions.

Obviously some of these don't make sense, like we're not going to syntax highlight "txt" output.

Regarding the existing languages being dubious, which ones specifically? I would be reticent to take a language away unless the highlighting quality is poor or it is super expensive for some reason.

python
typescriptreact
typescript
javascript
php
java
javascriptreact
go
vue
dart
html
markdown
css
csharp
cpp
kotlin
yaml
rust
c
scss
json
ruby
plaintext
sql
blade
svelte
shellscript
xml
lua
scala
properties
terraform
elixir
powershell
dockerfile
astro
dockercompose
apex
jsonc
prisma
latex
perl
groovy
luau
hcl
erb
zig
ini
txt
twig

…ild error due to loading of such library"

This reverts commit 9e85405. Upon running pnpm install again in the root pnpm
build no longer complained about the loading of the external library.
@ichim-david
Copy link
Contributor Author

ichim-david commented Oct 21, 2024

Patched this locally and tried it, it's great. I have pushed it here to run CI: #5953

@dominiccooney you've asked about the bundle difference. For Cody web the syntaxes are saved and loaded within dist/index.js
With these extra languages loaded:
dist/index.js 2,605.76 kB │ gzip: 581.65 kB
Without:
dist/index.js 2,542.84 kB │ gzip: 564.06 kB
So a 17.59kb increase to add syntax support for the following languages:
clojure,
dart,
delphi,
dockerfile,
elixir,
fortran,
groovy,
haskell,
html,
http,
jsonc,
matlab,
nix,
ocaml,
scala,
verilog,
vhdl,

@ichim-david
Copy link
Contributor Author

ichim-david commented Oct 21, 2024

@ichim-david regarding syntaxes to support, I don't have data about generated code languages handy, but here's the top 50 languages for autocomplete suggestions.

Obviously some of these don't make sense, like we're not going to syntax highlight "txt" output.

Regarding the existing languages being dubious, which ones specifically? I would be reticent to take a language away unless the highlighting quality is poor or it is super expensive for some reason.

python typescriptreact typescript javascript php java javascriptreact go vue dart html markdown css csharp cpp kotlin yaml rust c scss json ruby plaintext sql blade svelte shellscript xml lua scala properties terraform elixir powershell dockerfile astro dockercompose apex jsonc prisma latex perl groovy luau hcl erb zig ini txt twig

@dominiccooney to finish this logic I've looked at your list and found some languages that there are syntaxes available and some that aren't.

Syntax not found
apex
typescriptreact (there is typescript and js)
javascriptreact
vue
blade
svelte
shellscript (we load shell)
terraform
astro
dockercompose
apex
prisma
hcl
zig

Languages with support from highlight.js (https://github.com/highlightjs/highlight.js/tree/main/src/languages)
and not loaded but could be if desired:
julia
powershell
latex
erb
properties
twig

EDIT:
Based on the previous build and the number of new languages added and the bundle size increase it seems like
there is around 1kb of gzipped js loaded per loaded language.

beyang pushed a commit that referenced this pull request Nov 12, 2024
Authored by @ichim-david in #5874. This branch created to run CI.

## Test plan
Ask chat to provide an example program for dart language. 
You should now see the dart snippets having syntax highlighting which
started missing after
the work done in this pull request:
#5232

Example screenshot with the fix in action:

![dart-syntax](https://github.com/user-attachments/assets/78b0f19a-2c6c-4789-93d7-ef087eae7862)

## Changelog
Chat: brought back syntax highlighting for most common languages

---------

Co-authored-by: David Ichim <[email protected]>
@beyang
Copy link
Member

beyang commented Nov 12, 2024

Merged in #5953.

Thank you for the contribution and apologies for the delay @ichim-david!

@beyang beyang closed this Nov 12, 2024
@ichim-david ichim-david deleted the syntax_highlighting branch November 13, 2024 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants